Prediction-driven computational auditory scene analysis
نویسنده
چکیده
The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener – the ‘auditory scene analysis’ of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as ‘computational auditory scene analysis’. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This ‘data-driven’ approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot ‘infer’ the presence of a sound for which direct evidence is hidden by other components. The ‘prediction-driven’ approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft. The system is assessed through experiments that firstly investigate subjects’ perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization. Thesis supervisor: Barry L. Vercoe Title: Professor of Media Arts and Sciences
منابع مشابه
Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures
Computational auditory scene analysis – modeling the human ability to organize sound mixtures according to their sources – has experienced a rapid evolution as the simple principles suggested by psychological experiments have turned out to be less than the whole story. Phenomena such as the continuity illusion and phonemic restoration show that the brain is able to use a wide range of knowledge...
متن کاملPrediction-driven Computational Auditory Scene Analysis for Dense Sound Mixtures
We interpret the sound reaching our ears as the combined effect of independent, sound-producing entities in the external world; hearing would have limited usefulness if were defeated by overlapping sounds. Computer systems that are to interpret real-world sounds – for speech recognition or for multimedia indexing – must similarly interpret complex mixtures. However, existing functional models o...
متن کاملToward Automatic Sound Source Recognition: Identifying Musical Instruments
One of the broad goals of research in computational auditory scene analysis (CASA) is to create computer systems that can learn to recognize sound sources in a complex auditory environment. In this paper, a set of acoustic features is proposed that relate to the physical properties of sound-producing objects. In particular, a set of orchestral musical instrument sounds is presented as represent...
متن کاملResidue-Driven Architecture for Computational Auditory Scene Analysis
The Residue-Driven Architecture presented here is a model of auditory stream segregation from input sounds. A subsystem to extract auditory streams by using some sound attributes is called an agency and the design of each agency is based on the residue-driven architecture. This architecture consists of three kinds of agents: an event-detector, a tracergenerator, and tracers. The event-detector ...
متن کاملIndependent Study Computational Auditory Scene
Aim To do a literature survey of Computational Auditory Scene Analysis and look for features or techniques t hat can be used for purposes such as discriminating a particular sound (speech in this case) from all the other sounds.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996